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Abstract 

Rhinoviruses, formerly known as Human rhinoviruses, are the most common cause of air-borne upper respiratory tract 
infections in humans. Rhinoviruses belong to the family Picornaviridae and are divided into three species namely, Rhinovirus 
A, -B and -C, which are antigenically diverse. Genetic recombination is found to be one of the important causes for 
diversification of Rhinovirus species. Although emerging lineages within Rhinoviruses have been reported, their population 
structure has not been studied yet. The availability of complete genome sequences facilitates study of population structure, 
genetic diversity and underlying evolutionary forces, such as mutation, recombination and selection pressure. Analysis of 
complete genomes of Rhinoviruses using a model-based population genetics approach provided a strong evidence for 
existence of seven genetically distinct subpopulations. As a result of diversification, Rhinovirus A and -C populations are 
divided into four and two subpopulations, respectively. Genetically, the Rhinovirus B population was found to be 
homogeneous. Intra-species recombination was observed to be prominent in Rhinovirus A and -C species. Significant 
evidence of episodic positive selection was obtained for several sites within coding sequences of structural and non- 
structural proteins. This corroborates well with known phenotypic properties such as antigenicity of structural proteins. 
Episodic positive selection appears to be responsible for emergence of new lineages especially in Rhinovirus A. In summary, 
the Rhinovirus population is an ensemble of seven distinct lineages. In case of Rhinovirus A, intra-species recombination and 
episodic positive selection contribute to its further diversification. In case of Rhinovirus C, intra- and inter-species 
recombinations are responsible for observed diversity. Population genetics approach was further useful to analyze 
phylogenetic tree topologies pertaining to recombinant strains, especially when trees are derived using complete genomes. 
Understanding of population structure serves as a foundation for designing new vaccines and drugs as well as to explain 
emergence of drug resistance amongst subpopulations. 
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Introduction 

Rhinoviruses are known to infect upper and lower respiratory 
tract and cause common cold in humans. Though common cold is 
relatively mild in nature, it is a global socioeconomic burden [1,2]. 
Rhinoviruses are also associated with severe respiratory tract illnesses 
such as pneumonia [3], cystic fibrosis [4], bronchitis [5], chronic 
obstructive pulmonary disease [6], asthma [7] and whizzing 
illnesses in infants [8]. 

Rhinoviruses belong to the genus Enterovirus and family Picornavir- 
idae. Rhinoviruses were referred to as Human rhinoviruses (HRV) until 
very recently. The proposal to rename Human rhinovirus species as 
Rhinovirus, by removing host name has been approved by the 
International Committee on Taxonomy of Viruses (ICTV) and is 
made available online [9]. Accordingly, the term Rhinoviruses is 
being used in place of Human rhinoviruses and the three species of 
Rhinoviruses have been re-designated as Rhinovirus A, -B and -C. 
However, the serotypes of each of these species are still 
abbreviated as HRV-A, HRV-B and HRV-C [9]. In view of this, 
as an example, serotype 2 of Rhinovirus A species is referred to as 
HRV-A2. There are 77 and 25 known serotypes of HRV-A and 
HRV-B, respectively, based on cross-neutralization assays in cell 



culture [10,1 1]. There are 51 known types of HRV-C. These types 
are proposed on the basis of molecular phylogeny analysis of VP 1 
coding region since HRV-C strains are not culturable in vitro 
[12,13]. 

Rhhwviruses are small, non-enveloped, single-stranded RNA 
viruses with a positive sense genome of ~7200 bases. The genome 
contains a single open reading frame (ORF) which is flanked by 5 ' 
and 3 'untranslated regions (UTR). The ORF encodes a single 
polyprotein that is post-translationally cleaved into four structural 
proteins (VP1, VP2, VP3 and VP4) and seven non-structural 
proteins (2A, 2B, 2C, 3A, 3B, 3C and 3D). Sequences of various 
sub-genomic regions such as 5'UTR [14], VP4/VP2 [15], VP1 
[16], 3D polymerase [17] and partial 2 A [18] have been used to 
study evolutionary relatedness of HRV-A, -B and -C strains/ 
isolates. 

Serotypes of Rhinovirus A and -B species have been classified 
based on their receptor specificity into major and minor receptor 
group viruses. The major receptor group viruses are those which 
use intracellular adhesion molecule- 1 (ICAM) receptors. The 
minor receptor group viruses use members of low-density 
lipoprotein receptor (LDLR) family for entry into the host cell. 
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Twelve serotypes of Rhinovirus A (HRV-A1A, -A IB, -A2, -A23, - 
A25, -A29, -A30, -A31, -A44, -A47, -A49 and -A62) belong to the 
minor receptor group of viruses. The remaining serotypes of 
Rhinovirus A as well as all the serotypes of Rhinovirus B belong to the 
major receptor group [19,20]. 

Genetic diversity of Rhinoviruses is attributed to a high mutation 
rate, which is due to the low fidelity of RNA-dependent RNA 
polymerase that lacks proof-reading activity. In Pkornaviruses, RNA 
polymerase-mediated error rate has been estimated to be between 
10 3 and 10 4 errors/nucleotide/cycle of replication [21]. 
Additionally, the genetic recombination has also been reported 
to be a cause behind diversity in Rhinovirus species [22,23]. 
Recombination is likely to occur through co-infections as well as 
through successive infections by strains of different species of 
Rhinoviruses. Coinfection of a Rhinovirus with other respiratory 
viruses can also result in recombination [24]. Therefore, use of 
molecular phylogenetic analysis, a commonly used approach for 
classification of Rhinoviruses, may lead to inappropriate classifica- 
tion, especially in case of recombinant strains [25] . 

The present study was undertaken to assess the extent of 
recombination and its impact on genetic diversity in a Rhinovirus 
population. A Bayesian-based population genetics approach [26], 
which does not depend on phylogenetic analysis, was used to carry 
out population stratification studies and to identify diversifying 
lineages within the Rhinovirus population. Role of selection pressure 
was also analyzed to study the contributions of episodic positive 
selection in Rhinovirus evolution and in diversification of Rhinovirus 
population into distinct lineages. 

Materials and Methods 

Compilation and Curation of Dataset 

Complete genome sequences of 179 Rhinovirus strains were 
compiled from GenBank [27] and curated using the information 
of serotype/ strain made available by the Picornaviridae study group 
of IGTV [28]. There are 111, 40 and 28 entries respectively for 
Rhinovirus A, -B and -C species. The serotype data along with the 
GenBank accession numbers are provided in Table SI. 

Inference of Population Genetic Structure of Rhinoviruses 

Rhinovirus population diversity was analyzed using the dataset of 
179 complete genome sequences. The major steps in the analysis 
are multiple sequence alignment (MSA), extraction of parsimony- 
informative (PI) sites, linkage equilibrium analysis, identification 
and analysis of genetic structure using Bayesian-based clustering 
approach and verification of genetic structure using Analysis of 
Molecular Variance (AMOVA) test. All of these steps, along with 
the software and various parameters used are described below. 

Multiple sequence alignment and extraction of 
parsimony informative sites. MSA of 1 79 complete genome 
sequences was carried out using MUSCLE program in MEGA 
5.05 [29]. The parsimony informative (PI) sites were then 
extracted using MEGA and used as an input for STRUCTURE 
2.3.3 [26] and LIAN 3.5 [30] programs. A PI site is defined as the 
site that contains at least two types of nucleotide bases and at least 
two of them occur with a minimum frequency of two. The gaps 
were treated as the 5 th nucleotide state and ambiguous characters 
were considered as 'missing values'. A total of 5689 PI sites were 
obtained and these were referred to as 'loci'. The positions of loci 
in the whole genome alignment were used to generate genetic 
distance map. 

Linkage equilibrium analysis. The null hypothesis of 
linkage equilibrium was tested using LIAN 3.5 program [30]. 
This program implements Monte Carlo simulations to obtain 



simulated datasets where loci are resampled without replacement. 
This program computes a standardized index of association, I s A , 
which is a measure of the degree of haplotype-wide linkage derived 
from a dataset and is given by, I S a= [l/(« "!)][( Fi)/ J^e)- 1] • Here, 
Vd is the observed variance of pairwise distances between 
haplotypes (groups of closely related sequences that apparently 
share a recent common ancestry) and Ve is the variance expected 
when all loci are in linkage equilibrium. The term [(V D / V E )-Y\ 
represents a function of rate of recombination, which equals to 
zero for being in the state of linkage equilibrium. The number of 
loci analyzed is represented by e. 

In addition to I a, two additional measures of linkage 
disequilibrium (LD) viz., | D' | and r were also computed using 
DnaSP 5 program [31]. | D' | is the absolute value of the difference 
between the observed and the expected haplotype frequency in the 
absence of LD, which is normalised by the maximum (or 
minimum) possible value of this difference. The squared value of 
the difference between the observed and the expected haplotype 
frequency normalised by the variance of the allele frequency, is 
denoted by r 2 [32]. 

Identification and analysis of genetic structure. In order 
to analyze genetic structure of the Rhinovirus population, a 
Bayesian model-based clustering approach implemented in the 
STRUCTURE 2.3.3 program [26] was used. The STRUCTURE 
program provides various ancestry models to deduce population 
structure and to identify distinct subpopulations each of which is 
characterized by a set of allele frequencies at every locus. This 
method attempts to probabilistically assign individuals to popula- 
tions, while simultaneously estimating allele frequencies in the 
populations. We used the admixture and linkage models with 
correlated allele frequencies between populations for the analysis. 
These models account for individuals having mixed ancestry 
(potential recombinants) and also help to probabilistically assign 
such individuals to two or more populations. The linkage model is 
developed to account for potential linkage between loci and 
thereby to avoid underestimation or overestimation of the 
admixed individuals [26,33]. 

The admixture model was built using 20,000 burn-in and 
40,000 Markov Chain Monte Carlo (MCMC) run lengths. Default 
values were used for other parameters such as allele frequency 
parameter (/), Dirichlet parameter for degree of admixture (a), etc. 
The optimum number of clusters (K„p^j represents the number of 
subpopulations. To determine the K opt , twenty independent 
simulation runs were carried out for each value of K, ranging 
from 1 to 13. This analysis led to the calculation of posterior 
probability of data for a given value of K and associated standard 
deviation, which are used to calculate AKas suggested in [34]. The 
plot of iTversus AiTwas finally used to determine the value of K opb 
which is represented by the highest peak in the plot K„p t . Linkage 
model was built based on 20,000 burn-in, 40,000 MCMC run 
lengths and 10,000 admixture burn-in length. 

Validation of Genetic Structure Hypothesis 

The population genetic structure in Rhinoviruses obtained by 
STRUCTURE 2.3.3 program was validated using F ST values 
(Fixation indices) obtained by applying Analysis of Molecular 
Variance (AMOVA) test implemented in ARLEQUIN 3.11 
software package [35]. 

Molecular Phylogeny Analysis of Rhinovirus Population 

Molecular phylogeny analysis (MPA) was carried out using the 
dataset of 179 complete genomes of Rhinoviruses using the 
Neighbor-joining (NJ), the Maximum likelihood (ML) and the 
Maximum parsimony (MP) methods provided in MEGA [29]. 
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Multiple sequence alignment was obtained using MUSCLE 
program in MEGA and it was used to generate the phylogenetic 
trees. MSA file in mega format is provided as Text SI. Bootstrap 
analysis was carried out by sampling 1000 replicates to estimate 
essential correctness of resultant phylogenetic tree topology/ies 
generated using all the three methods. The FigTree 1.2.3 software 
[36] was used to visualize the phylogenetic trees. 

Recombination Analysis 

Complete genome alignment of all 179 Rhinovirus strains 
obtained using MEGA was screened for the presence of potential 
recombinant sequences and identification of potential parents 
(major and minor) using various recombination detection methods 
implemented in the RDP4 program [37]. These methods include 
RDP [38], GENCONV [39], BOOTSCAN [40], MAXICHI 
[41], CHIMAERA [42], SiScan [43] and 3SECi [44]. A stringent 
p-value cutoff of less than 0.00001 was applied and the multiple 
comparison correction setting was kept "on" (as default). A 
sequence is considered as potential recombinant only if it is 
significandy (with p<0. 00001) identified as a recombinant by 
more than any of the two methods mentioned above. The 
potential recombinant Rhinovirus strains obtained using RDP4 were 
cross-checked if they were identified as admixed by the 
STRUCTURE program. The individual is said to be admixed 
when it has >5% ancestry (or >0.05 membership score) to belong 
to more than one subpopulation. 

Selection Pressure Analysis 

In order to examine potential evidence of selection pressure in 
the codons of Rhinovirus, the potential recombinant sequences 
identified by RDP4 program were excluded. Thereby the dataset 
consisting of 133 Rhinovirus strains (86 HRV-A, 33 HRV-B and 14 
HRV-C) was used for the analysis of selection pressure. Using 
these entries, 12 separate datasets were prepared which include 
individual coding sequences of all the 1 1 proteins (VP1 to VP4, 2 A 
to 2C and 3A to 3D) as well as the complete coding sequence for 
the polyprotein. 

The outcome of positive selection analysis is largely dependent 
on the choice of the MSA program used. Effect of removing 
unreliable alignment regions during analysis of positive selection is 
recently reported [45]. This study indicates that performance of 
positive selection identification is best for the MSA obtained using 
program PRANK and worst for MSA obtained using the 
CLUSTALW program. The MUSCLE and MAFFT programs 
were found to perform better and were ranked next to the 
PRANK. Therefore, each of the 12 datasets was independendy 
subjected to the codon alignment using PRANK, MUSCLE and 
MAFFT programs implemented in the GUIDANCE server [46]. 
GUIDANCE server is useful to check the quality of the codon 
alignment because it provides confidence score for entire 
alignment as well as for each base and column within the 
alignment. We compared GUIDANCE confidence scores for 
alignments obtained using PRANK, MUSCLE and MAFFT 
programs for all the 12 datasets. Since GUIDANCE confidence 
scores for MUSCLE-based alignments were found to be higher for 
most of the 12 datasets, alignments derived by MUSCLE were 
used for selection pressure analysis. 

All the 12 datasets were independendy analyzed for the 
evidence of selection pressure (positive or negative) using three 
different maximum-likelihood methods such as Single Likelihood 
Ancestor Counting (SLAC) [47], Fixed Effect Likelihoods (FEL) 
[47] and Internal Fixed Effect Likelihoods (IFEL) [48]. These 
methods are available at the Datamonkey web-server, which is a 
part of the HYPHY package [49-51]. These three methods 



basically estimate the ratio of the non-synonymous to synonymous 
mutations (dN/dS or Cfl) at every codon in the alignment. The 
results were analyzed using default setting oip = 0.l. The run for 
identification of the best model was carried out by using 
automated model selection tool at Datamonkey server. The 
general time reversible (GTR) model is found to be the best model 
for Rhinovirus datasets and thus subsequendy used during selection 
pressure analysis employing all of the three methods mentioned 
above. 

All the three methods enable detection of the sites that are 
under pervasive positive selection across all the lineages in the 
phylogenetic tree. In order to account for the sites that are under 
episodic positive selection (sites that are positively selected only in a 
few lineages), a recently developed method, namely, Mixed Effects 
Model of Evolution (MEME) was used [52]. MEME is available at 
the Datamonkey server. MEME method integrates both site-to-site 
rate variation (by employing FEL along the sites) as well as lineage 
to lineage rate variation (by employing Random-effects likelihood 
across the branches) to detect episodic diversifying selection. Thus, 
MEME method is helpful to infer the episodes of diversifying 
evolution, which may affect only a small subset(s) of lineages even 
when the majority of the lineages are subjected to the purifying 
selection. 

In order to confirm the evidence of episodic diversifying 
selection in HRV-A, -B and -C, we have also used Branch-site 
Random-effects likelihood (BSR) method [53], which is available 
at the Datamonkey server [49-51]. The BSR method helps in 
identifying nodes/branches in the NJ tree that undergo episodic 
positive selection. The BSR method provides mapping of the 
proportion of sites that are under episodic positive selection and 
also indicates the strength of selection on relevant nodes/branches 
of phylogenetic tree. Therefore the dataset consisting of complete 
coding sequences of polyprotein was used to study the effect of 
episodic selection on diversification of lineages in each of HRV-A, 
-B and -C. 

Genotype-phenotype Correlation Analysis 

The amino acid residues corresponding to the positively selected 
codons were mapped on the proteins for which three-dimensional 
structures are available in the Protein Databank (PDB) [54]. 
Functional implications of such amino acids in antigenicity, drug- 
resistance or receptor attachment, etc. were also analyzed as the 
experimental data for antigenic sites [55,56] and receptor-binding 
sites [57,58] for viruses of major and minor receptor groups are 
available. The hydrophobic drug-binding site within VP1 has 
already been experimentally characterized [16]. 

The structures of proteins of HRV-A2 serotype (capsid proteins 
[PDB ID: 1FPN], 2A protease [PDB ID: 2HRV], 3D polymerase 
[PDB ID: 1XR6]) were used to represent all the HRV-A minor 
receptor group serotypes. The structures of proteins of HRV-A16 
serotype (capsid proteins [PDB ID: 1AYM], 3D polymerase [PDB 
ID: 1XR7]) and of HRV-B 14 serotype (capsid proteins [PDB ID: 
1R09], 3D polymerase [PDB ID: 1XR5]) were respectively used to 
represent all HRV-A and HRV-B major receptor group serotypes. 
The three-dimentional structures are visualized and rendered 
using SwissPDB viewer 3.7 [59]. 

Results 

In this study, a comprehensive analysis of the complete genome 
sequences of Rhinoviruses has been carried out to understand the 
role of evolutionary forces such as recombination and selection 
pressure to study population diversity. 
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In case of Rhinoviruses, role of recombination has been 
established already [12,22-24,60-62]. The extent of recombina- 
tion in the population of Rhinoviruses was studied using the 
STRUCTURE program. This program has been developed to 
infer population structure of haploid, diploid or polyploid 
organisms [33]. It was initially used to identify various ethnic 
groups in human populations [63]. Subsequendy, the program has 
been gainfully applied to study genetic structure and admixture in 
various diploid populations such as those of chimpanzee [64], 
chicken [65], etc. Its applicability to infer population structure in 
haploid organism namely Helicobacter pylori has been demonstrated 
for the first time in 2003 [66]. It was later used to analyze haploid 
populations of organisms such as Plasmodium falciparum [67], 
Hepatitis B virus [68], species of the genus Begomovirus [69], etc. 

Use of the STRUCTURE program to study population of 
Rhinovirus requires validation of the hypothesis that most of the loci 
are in the state of linkage equilibrium. Therefore, analysis of extent 
of linkage disequilibrium (LD) in the Rhinovirus dataset was carried 
out. 

Linkage Equilibrium Analysis 

In order to test the degree of linkage equilibrium present within 
Rhinovirus genomes, a standardized index of association (I S A ) 
between PI sites across genomes was calculated using LIAN 3.5 
program [30]. The value of f A is expected to be zero in case of 
free recombination. The f A value obtained for Rhinoviruses was 
found to be 0.0666 (/><10~ 4 , 10000 replicates). The value of f A , 
though apparendy small is found to be significant by the virtue of 
p-value criteria. It suggests weak evidence of linkage disequilib- 
rium (LD) and indicates nonrandom association of polymorphic 
loci. In order to confirm low evidence of LD, plots of | D' | and r 
against distance between loci were obtained (data not shown). 
Average values of | D' | and that of / were calculated and found to 
be 0.5409 and 0.0613, respectively. These values also substantiate 
weak evidence of LD. These findings together indicate that the 
polymorphic loci are only weakly correlated and therefore the use 
of STRUCTURE program to study genetic diversity of Rhinovirus 
population is justified. 

Analysis of Rhinovirus Population Structure 

In order to study the population structure and to identify 
recombinants, if any, the admixture model in the STRUCTURE 
program was used. Twenty independent simulation runs using 
admixture model (see methods for details) were carried out for 
K= 1 to 13. The K„p t of 7 was determined based on a clear peak of 
AK (the rate of change of posterior probability given K) obtained 
using the plot of K versus AK as shown in Figure 1. These 
observations suggest existence of seven genetically distinct 
subpopulations of Rhinoviruses. Evidence of seven subpopulations 
was further confirmed by the AMOVA test implemented in the 
ARLEQUIN 3.11 software [35]. The test statistic, F ST value of 
0.44 with p = 0.0, strongly suggests that the differences among 
emerging and existing subpopulations of Rhinoviruses are statisti- 
cally significant. Thus, the K„p, of 7 indicates that genetic structure 
of Rhinoviruses consists of three known species viz., HRV-A, -B and 
-C as well as three additional subpopulations emerging from 
HRV-A and one additional subpopulation from HRV-C. Sche- 
matic representation of subpopulations of Rhinovirus population at 
R~= 7 (using admixture model) is shown in Figure 2. 

Genetic structure of Rhinovirus population at varying values of K 
(3 to 7) was also analyzed as there are three known species, 
however, the K 0 p t was found to be 7. The STRUCTURE program 
assigns individuals to one or more subpopulations based on 
relative membership score that vary from 0 to 1. Membership 



score of 1 is an indicator of being a member of a cluster, whose 
members have evolved from a single ancestor. The program also 
facilitates identification of admixed individuals, which could have 
evolved due to events such as recombination of strains that belong 
to different subpopulations. As a result, an admixed individual is 
assigned with the membership scores to belong to respective 
subpopulations, which clearly indicate mixed ancestry. In Figure 2, 
the bar plot depicts various subpopulations by distinct colors. The 
admixed strains are represented by multiple colors in the bar plot 
based on their membership scores. 

At K= 3, all the three known species of Rhinoviruses were found 
to cluster separately as expected. Analysis carried out at K= 4, 
indicated occurrence of admixed strains within Rhinovirus A 
population. This suggests that HRV-A is subdivided into a 
(potentially) pure subpopulation A and one or more admixture 
subpopulations. Additionally, a distinct subpopulation is found to 
be emerging from HRV-A population that consists of 1 2 serotypes 
such as HRV-A7, -A20, -A28, -A36, -A51, -A58, -A65, -A71, - 
A88, -A89, -A102 and -A103. Among these serotypes, HRV-A89 
and -A36 show the estimated membership scores of 1 for the 4 th 
cluster while serotypes viz. HRV-A58, -A7, -A88 showed 
membership scores of >0.70 for the 4 th cluster. The estimated 
membership scores for remaining serotypes were found to be in 
the range of 0.47 to 0.50, which indicated that these members 
could also belong to another subpopulation/(s). Hence, analysis of 
the populations was undertaken at K= 5. 

Analysis of population stratification at R~= 5 revealed that HRV- 
B and -C populations remain undivided and HRV-A strains 
clustered as three subpopulations viz. subpopulation A and two 
admixture subpopulations Al and A2. The Al subpopulation 
includes strains of five HRV-A serotypes viz. HRV-A7, -A36, - 
A58, -A88 and -A89 with membership scores ranging from 0.7 to 
1. The A2 subpopulation includes strains of seventeen HRV-A 
serotypes viz. HRV-A8, -A12, -A20, -A28, -A45, -A46, -A51, - 
A53, -A65, -A68, -A71, -A78, -A80, -A95, -A101, -A102 and - 
A103. The membership scores of strains of serotypes belonging to 
A2 subpopulation were in the range of 0.72 to 0.999, with a few 
exceptions. These exceptions include serotypes viz. HRV-A8, - 
A95, -A45, -A12 and -A78 having scores in the range of 0.45 to 

0. 50. In order to confirm the classification of these five serotypes 
which have low membership scores, analysis at K= 6 was carried 
out. 

At K= 6, HRV-C population is subdivided into two subpopu- 
lations viz. CI and C2. Majority of the HRV-C strains belonged to 
the CI subpopulation. The CI subpopulation includes nineteen 
HRV-C strains showing membership scores in the range of 0.63 to 

1. While the C2 subpopulation consists of nine HRV-C strains 
showing membership scores in the range of 0.56 to 1. The 
members of the C2 subpopulation include two HRV-C6 strains 
(strain 026 [GenBank: EF582387] and strain HRV- 
C06_pl031_sR2724_2009 [GenBank: JN990702]), two HRV- 
C3 strains (QPM [GenBank: EF186077] and HRV- 
C03_pl280_s6359_1999 [GenBank: JN798567]), isolate LZ651 
[GenBank: JF317016], HRV-C 1 strain NAT001 [GenBank: 
EF077279], HRV-C 10 strain QCE [GenBank: GQ323774], 
HRV-C 7 strain NY074 [GenBank: DQ875932] and HRV-C43 
strain namely HRV-C43_pl 154_sRl 124_2009 [GenBank: 
JX074056]. Membership scores indicate no discrepancy in the 
assignment of HRV-C strains to their respective CI or C2 
subpopulations, however, assignment of HRV-A strains viz. HRV- 
A8, -A95, -A45 and -A12 was not resolved at K= 6. Hence analysis 
of clustering pattern at K= 7 was carried out. 

At K= 7, the subpopulation A2 is observed to further subdivide 
into another subpopulation named A3. This subpopulation 
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Figure 1. Determination of K opt using the plot of Kvs. JK. AK represents the rate of change of posterior probability given the number of 
clusters (K). AK\s plotted against K to determine optimum number of clusters (K opt ) within Rhinovirus population (comprising of HRV-A, -B and -C).The 
peak at K = 7, represents that K opt is 7 and thus indicate that Rhinovirus population is subdivided into seven genetically distinct subpopulations. 
doi:10.1371/journal.pone.0088981.g001 



includes three HRV-A serotypes viz. HRV-A8, -A95 and -A45. 
The two serotypes viz. HRV-A8 and -A95 show membership 
scores of 0.989 and 0.987, respectively, to belong to the 
subpopulation A3 while HRV-A45 show admixture from 
subpopulations A3 (membership score 0.43), Al (membership 
score 0.22) and pure A (membership score 0.20). Thus, the highest 
membership score suggests that HRV-A45 belongs to the A3 
subpopulation. However, HRV-A12 and -A78 showed highest 
membership scores to belong to A2 viz. 0.41 and 0.46, 
respectively. This suggests that HRV-A 12 and -A78 are members 
of A2 subpopulation. Figure 2 clearly shows subdivision of HRV-A 
population into four subpopulations such as A, Al, A2 and A3. 
Similarly, HRV-C population is subdivided into subpopulations 
CI and C2 while the Rhinovirus B population shows no further 
subdivisions. Similar analysis at K= 8 produced inconsistent 
clustering during repeated runs, which further confirmed that 
the Rhinovirus population is subdivided only in seven distinct 
subpopulations as also observed by plotting K versus AK (Figure 1). 

Subsequendy, analysis employing linkage model at K— 7 was 
carried out to confirm assignment of admixed strains to respective 
subpopulations. Linkage model also validated clustering of 
Rhinovirus strains into seven distinct genetic subpopulations (A, 



Al, A2, A3, B, CI and C2). The bar plot obtained by the linkage 
model is given in the Figure S 1 . The results obtained using linkage 
model substantiate that the admixed strains were correctiy 
identified by the admixture model except in case of few strains 
belonging to the subpopulation A2 and C2. The linkage model 
provided evidence of inter-subpopulation admixture in case of five 
HRV-C strains (HRV-C 3 [GenBank: EF186077], HRV-C 1 
[GenBank: EF077279], HRV-C10 [GenBank: GQ323774], 
HRV-C 7 [GenBank: DQ875932] and HRV-C43 [GenBank: 
JX074056]). These strains show admixture membership scores to 
belong to the subpopulations of HRV-A species, especially A2, in 
addition to the CI and C2 subpopulations. In case of subpopu- 
lation A2, linkage model uniquely helped to provide evidence of 
admixture amongst the members of subpopulation A2. 

Sublevel Clustering in Rhinovirus A 

In order to validate the subdivision of Rhinovirus A population 
into four subpopulations (A, -Al, -A2, -A3) and to study extent of 
intra-species admixture, the dataset consisting of 1 1 1 genomes of 
HRV-A strains were subjected to independent sublevel clustering 
using the STRUCTURE program. The plot of K versus AK 




A2 A3 



C2 CI 



Figure 2. Population structure of Rhinoviruses obtained by Bayesian-based clustering approach using admixture model at K= 7. HRV- 
A comprises of four subpopulations viz. -A (blue), -A1 (yellow), -A2 (red), -A3 (green). HRV-B members form a single cluster (magenta) with no further 
subdivision. HRV-C comprises of two subpopulations viz. C1 (orange) and C2 (cyan). The A1, A2, A3, C1 and C2 subpopulations show presence of 
several admixed strains. Admixed strains are color coded based on the proportion of membership scores to belong to the respective subpopulations. 
doi:1 0.1 371 /journal.pone.0088981 .g002 
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indicates highest value of K at 2 followed by a clear peak at K— 13 
(Figure 3). The clustering obtained at K— 2 confirmed that the 
Rhinovirus A population primarily divides into two subgroups. The 
first subgroup consists of the members of the subpopulation A 
while the second subgroup comprises of three subpopulations Al, 
A2 and A3. 

Presence of substructure within Rhinovirus A population at K— 1 3 
is validated using AMOVA test in ARLEQUIN software. F ST 
value of 0.42 (p = 0) obtained at K= 13 supports existence of 
thirteen sublevel subpopulations within Rhinovirus A population. 
The members belonging to thirteen subclusters are listed in Table 
S2. Interestingly, this result also suggests presence of Al, A2 and 
A3 as genetically distinct subpopulations. The subpopulation A2 is 
subdivided into two subclusters and members of subpopulation A 
are divided into nine distinct subclusters. No further subdivision of 
Al and A3 subpopulations was observed. The grouping of these 
subclusters is also in agreement with the topology of phylogenetic 
tree obtained using NJ-based method (described below). 

Sublevel Clustering in Rhinovirus C 

The plot of K versus AK (Figure 4) suggests an initial peak at 
K= 2 followed by two clear peaks at 4 and 9. The F ST value of 0.32 
(p = 0) was obtained for K= 4, which is lower than the cutoff for the 
significant F ST value The F ST of 0.47 (p = 0) obtained for K= 9 is 
statistically significant and provides evidence of sublevel genetic 
structure within HRV-C. This observation suggests presence of 
nine sublevel subpopulations in HRV-C . The C 1 subpopulation is 
subdivided into seven sublevel subpopulations while the C2 
subpopulation is divided into two sublevel subpopulations. It also 
implies that two subpopulations in HRV-C could be further 
diversified into multiple subpopulations (Table S3). However, 
genomic data of additional HRV-C types would be necessary to 
substantiate further subdivision of HRV-C . 

Molecular Phylogeny Analysis of Rhinovirus Population 

Phylogenetic analysis was carried out using Neighbor-joining 
(NJ), Maximum likelihood (ML) and Maximum parsimony (MP) 
methods and trees are shown as Figures 5, S2 and S3 respectively. 
The phylogenetic trees generated using all the three methods 

180 n 




2 3 4 5 6 7 8 9 10 11 12 13 14 



K 

Figure 3. Sublevel clustering of HRV-A: The plot of Kvs. JK obtained for HRV-A strains. AK represents the rate of change of posterior 
probability given the number of clusters (K). AK is plotted against K to determine optimum number of clusters (K opt ) within Rhinovirus A population. A 
major peak at K=2 and a minor peak at K= 13 is observed. It suggests that Rhinovirus A population primarily divides into two major groups. The 
minor peak at K= 13 indicates that Rhinovirus A population is further subdivided into 13 minor subpopulations. 
doi:1 0.1 371 /journal.pone.0088981 .g003 



revealed that there are seven major clusters, which correspond to 
the seven subpopulations (A, Al, A2, A3, B, CI and C2), obtained 
by the STRUCTURE program. Further, topologies of all the 
three trees divide HRV-A species into four subpopulations and 
HRV-C species into two subpopulations. The order of clustering 
of evolutionarily related serotypes within every subpopulation was 
also found to be similar in all the three trees barring the 
recombinant strains. As can be seen from Figure 5, the order of 
clustering of species in the NJ tree was HRV-A (innermost), HRV- 
C (intermediate) and HRV-B (outermost), which is in accordance 
with the whole genome-based NJ tree, published earlier [22]. 
However, the order of clustering of species in the trees generated 
using MP and ML methods was HRV-A (innermost), HRV-B 
(intermediate) and HRV-C (outermost). The interchange in the 
order of HRV-B and HRV-C clusters could be attributed to the 
underlying models of respective phylogenetic methods. Therefore, 
reconstruction of phylogeny using NJ, MP and ML methods was 
also carried out by including complete genomes of the type species 
of the genus Enterovirus as an outgroup. The outgroup included 
three serotypes of the type species Enterovirus C such as Human 
coxsackievirus A13 (GenBank: AF499637), Human coxsackievirus A21 
(GenBank: AF546702) and Human poliovirus 1 (GenBank: V01 149). 
The resultant MP and ML trees (not shown) depicted similar order 
of species clustering (HRV-A, -C and -B) as observed in both the 
NJ trees, without outgroup (Figure 5) and with outgroup (tree not 
shown). 

As mentioned above, differential placement was observed for 
recombinant/admixed strains such as HRV-A78, HRV-A12, 
HRV-A46, HRV-A80 and HRV-C 39 in all the three trees 
(Figures 5, S2 and S3). The molecular phylogenetic methods fail to 
resolve the placement of recombinant strains [25] . For example, 
two strains of HRV-A78 serotype, which belong to the subpop- 
ulation A2 were observed to share an immediate evolutionary 
ancestor with the members of subpopulation Al and A rather than 
A2, in both, NJ and ML trees (Figure 5 and S2). The population 
stratification results obtained using STRUCTURE program (using 
admixture model), suggest that HRV-A78 serotype is admixed 
with membership scores of 0.46 and 0.34 for A2 and pure A 
subpopulations, respectively. The membership scores show that 
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Figure 4. Sublevel clustering of HRV-C: The plot of A" vs. JK obtained for HRV-C strains. AK represents the rate of change of posterior 
probability given the number of clusters (K). AK\s plotted against Kto determine optimum number of clusters {K opt ) within Rhinovirus C population. A 
major peak at K= 2 and two minor peaks at 4 and 9 are observed. The major peak at K= 2 suggests that Rhinovirus C population primarily divided into 
two major groups (which correspond to the C1 and C2 subpopulations as mentioned in the text). The peak at K=9, represents optimum number of 
minor subpopulations within HRV-C based on significant F S j value of 0.47. 
doi:10.1371/journal.pone.0088981.g004 
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Figure 5. Phylogenetic tree of Rhinoviruses obtained using Neighbor-joining method in MEGA 5.05. Complete genome sequence data 
with 1 000 bootstrap replicates was used. The operational taxonomic unit (OTU) label consists of two parts divided by pipe ('|') character. The first part 
(before '|') indicates species-serotype and second part constitute GenBank accession number of the associated entry. The branches in the tree are 
color coded as per the seven subpopulations obtained using STRUCTURE program [Subpopulation A: blue, A1 : yellow, A2: red, A3: green, B: magenta, 
CI: orange, C2: cyan]. 
doi:1 0.1 371 /journal.pone.0088981 .g005 
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HRV-A78 strains are the members of A2 subpopulation, however, 
further sublevel clustering analysis of only HRV-A population 
using STRUCTURE program indicates that HRV-A78 strains 
form a distinct lineage, which might evolve independently over the 
period of time. 

The HRV-C39 strain, which belongs to the CI subpopulation 
(according to the STRUCTURE program), clusters with the 
members of C2 subpopulation in all the three trees (Figures 5, S2 
and S3). The membership scores of HRV-C39 suggest that it is an 
admixed strain having major membership to belong to CI 
subpopulation and minor membership to belong to C2 subpop- 
ulation. Thus, it is advisable to interpret the results of phylogenetic 
trees in the light of population genetics analysis and vice versa. 
The phylogenetic trees obtained using NJ and ML methods for 
proteome dataset (trees not shown) also corroborates with the NJ 
tree obtained using complete genomes (Figure 5). 

Recombination Analysis Using RDP4 & STRUCTURE 
Programs 

Extent of recombination in Rhinovirus genomes was analyzed 
using p-value cutoff of <0.00001 in RDP4 program [40]. The 
analysis revealed that there are a total of 46 potential recombinant 
strains (Table S4), out of which 23 recombinant strains were 
previously reported [22]. The admixed strains identified using the 
STRUCTURE program were further analyzed using the RDP4 
program. Both these programs suggest that the majority of the 
recombinants were the members of A2, C 1 and C2 subpopulations 
and there was a good agreement between results obtained by both 
the programs. For example, results obtained by RDP4 program 
suggest that strains belonging to HRV-A101 serotype are 
recombinants having HRV-A65 (which belongs to subpopulation 
A2) as a major parent and HRV-A78 (which belongs to 
subpopulation A) as a minor parent. Similarly, the STRUCTURE 
program identifies HRV-A101 strains as admixed (at A^ ( =7, 
linkage model) and assigns the membership scores of 0.81 and 0.16 
to belong to subpopulations A2 and A respectively. Thus, while 
the STRUCTURE program suggests potential contribution of 
respective subpopulations to form admixed individuals, the RDP4 
program helps to identify the extent of recombination from 
respective strains. The admixture(s) between two distinct subpop- 
ulations were correctly captured by the STRUCTURE program 
at K= 7 using linkage model. Furthermore, subpopulation A in 
itself constitutes potential recombinants having both the parents 
belonging to the subpopulation A itself. Such "within-subpopula- 
tion admixture" could also be correctly interpreted using the 
HRV-A sublevel clustering results obtained at K= 13. For 
example, analysis using the RDP4 program suggests that, one 
member of subpopulation A, namely, HRV-A38 is a potential 
recombinant of HRV-A9 (major parent) and HRV-A98 (minor 
parent), both of the parents belonging to subpopulation A. 
Therefore, HRV-A38 is shown (by the STRUCTURE program) 
to have membership of ~ 1 to belong to subpopulation A implying 
no admixture/recombination. On the other hand, sublevel 
clustering of HRV-A, at K= 1 3 revealed that HRV-A38 is 
admixed strain having membership scores of 0.164 and 0.154 to 
belong to sublevel HRV-A clusters represented by strains HRV- 
A9 and HRV-A98, respectively. 

The analysis carried out using RDP4 program also substanti- 
ated the evidence of recombination in two known minor receptor 
group viruses (HRV-A31, HRV-A47) and one potential minor 
receptor group strain (HRV-A N13). All the three viruses 
mentioned above have the HRV-A54 as a major parent (a 
member of major receptor group) and HRV-A25 as a minor 
parent (a member of minor receptor group). As can be seen in 



Figure 5, the minor group viruses form a single monophyletic 
cluster, which in turn clusters with their parent (HRV-A54) that 
belongs to major receptor group viruses. 

The subpopulation A3 consists of only three strains viz. HRV- 
A8, -A95 and -A45. According to the STRUCTURE program, 
HRV-A45 is an admixed strain while HRV-A8 and -A95 are not 
admixed. The HRV-A45 had membership scores for A2 (0.2 1 3), A 
(0.191) and A3 (0.448) subpopulations, respectively while HRV-A8 
and -A95 have membership score of 1 to belong to A3 
subpopulation. The analysis carried out using RDP4 program at 
p<0. 00001 reveals that HRV-A45 is not a recombinant whereas 
evidence of recombination in HRV-A45 was obtained at relaxed 
p-value (<0.001). It appears that data of additional strains of A3 
subpopulation would be required to confirm extent of recombi- 
nation events, if any. 

Analysis of recombination in HRV-C population using RDP4 
program revealed that eight out of nine strains of C2 subpopu- 
lation were recombinants (Table S4). Out of these eight 
recombinant strains, seven were shown to have a minor parent 
belonging to HRV-A species, viz. HRV-A- 101-vl [GenBank: 
GQ415052]. The recombination events were observed to have 
occurred in either the 5'UTR region or in the 5'UTR-polyprotein 
junction. The major parent of these recombinant strains however 
could not be ascertained using RDP4 program. The STRUC- 
TURE program using linkage model (at K= 7), however, provided 
significant evidence of inter-species admixture in five strains (out of 
seven strains identified by RDP4 program) belonging to the C2 
subpopulation viz. HRV-C 3 [GenBank: EF 18607 7], HRV-C 1 
[GenBank: EF077279], HRV-C 10 [GenBank: GQ323774], 
HRV-C 7 [GenBank: DQ875932] and HRV-C43 [GenBank: 
JX074056]. The outcome of the STRCUTURE program further 
indicates that these strains also show membership scores to belong 
to both CI and A2 subpopulations, in addition to the major 
subpopulation C2. It implies that these recombinant strains are 
likely to have a major parent belonging to the C2 subpopulation. 
Thus the C2 subpopulation/lineage is likely to be derived as a 
result of inter-species admixture between HRV-A and HRV-C 
strains. 

Selection Pressure Analysis: Evidence of Pervasive 
Positive Selection 

A total of 1 2 datasets were analyzed for the potential evidence of 
positive selection pressure. Only one codon viz. 881 was found to 
be positively selected when the dataset of complete coding 
sequences of polyprotein of all the three species was analyzed. 
This codon corresponds to the amino acid position 267 in VP1 
protein and is known to be a part of the antigenic site B [56]. This 
site is a part of neutralizing epitope in minor receptor group 
viruses. The strains of minor receptor group of viruses have amino 
acid viz. aspartic acid (D) at position 267, which is replaced by 
glycine (G) or serine (S), in major receptor group viruses. Similarly, 
an independent analysis of dataset consisting of VP1 also revealed 
the evidence of positive selection at the same codon, which 
corresponds to residue 267 of VP1. For this codon, the difference 
(dN-dS) of 29.23 (£ = 0.007) and dN/dS ratio of 2.720 (p = 0.0257) 
were obtained using SLAC and IFEL methods, respectively. 
However, no significant evidence (p<0.1) of pervasive positive 
selection was obtained for the remaining gene datasets. Overall, 
majority of the codon sites in all the 1 1 genes were significantly 
found to be under purifying selection. 
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Selection Pressure Analysis: Evidence of Episodic Positive 
Selection 

A total of 1 1 datasets corresponding to every gene were 
analyzed separately using MEME method [52]. Significant 
evidence of episodic positive selection (p<0.05) was obtained in 
a total of 29 codons. These codons are found within the coding 
regions of only three structural (VP1, VP2, VP3) and three non- 
structural (2A, 2C and 3D) proteins. The detail results are given in 
Table 1. 

A total of 12 codons within VP1 were found to be under 
significant episodic positive selection. Further mapping with the 
phenotypic properties revealed that out of the 12 codons, 8 codons 
were associated either with antigenicity or with LDLR-receptor 
binding site in minor receptor group viruses. The codons 86, 91 
and 95 (numbering according to HRV-A2 [GenBank: X02316]) 
are known to be part of antigenic site A [55] while the codons 268 



and 278 are the part of known antigenic site B [56] of the minor 
receptor group viruses. Similarly, the codons 88, 89 and 228 are 
known to occur in the LDLR footprint region [58]. The amino 
acid coded by codon 80 is reported to be within 0.04 A of the 
ICAM-binding site [70]. 

In case of VP2, four codons viz. 104, 235, 262 and 263 were 
found to be under significant episodic selection. The amino acid 
(serine) coded by the codon 235 precedes the known antigenic 
residue (alanine) that belongs to antigenic site C in the minor 
receptor group viruses [56]. 

The four codons viz. 171, 180, 234 and 236 in VP3 gene were 
found to be under significant episodic positive selection. The 
codon 180 in HRV-A2 (minor receptor group virus) corresponds 
to the codon 178 in HRV-B14 (major receptor group virus). In 
HRV-B14, proline at the 178 position is known to be the part of 
ICAM-binding site [70]. Similarly, the codons 234 and 236 in 



Table 1. The codons under episodic diversifying selection identified using MEME method are reported. 
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A total of 29 codon sites within Rhinovirus genome are significantly (p<0.05) found to be under episodic diversifying selection using MEME method [52]. The codon 
sites are numbered according to the HRV-A2 genome (GenBank: X02316). Majority of these sites are found to be within three structural (VP1, VP2 and VP3) while few 
sites also within non-structural (2A, 2C and 3D) proteins. This table reports the distribution of synonymous (a) and non-synonymous (p) substitution rates over sites as 
inferred by MEME. p- represents the maximum likelihood estimate (MLE) of the non-synonymous rate for the branch class with p<a. Pr[p = p-] represents the MLE of the 
proportion of sites evolving at p- p+ re presents the MLE of the unconstrained p non-synonymous rate. Pr[p = p+] represents The MLE of the proportion of sites evolving 
at p+. The p-value is derived using a mixture of % 2 distributions, and q-values are obtained using Sime's procedure, which controls the false discovery rate under the 
strict neutral null (likely to be conservative). 
doi:1 0.1 371 /journal.pone.0088981 .t001 
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HRV-A2 corresponds to the amino acid positions 233 and 235 in 
HRV-B 1 4 and are observed to be in close proximity of ICAM- 
binding site [70]. 

The 2A gene consists of four codons viz. 2, 44, 74 and 141 
under significant episodic positive selection. The codon 44 (GTA) 
corresponds to amino acid residue valine (V) in the protein 2A of 
HRV-A2 serotype. Three-dimensional structure of 2A protein is 
available only for HRV-A2 serotype (PDB ID: 2HRV) [71]. It has 
been reported that residues 40-52 form an inter-domain loop 
between N- and C-terminal domains. The codon 74 corresponds 
to the amino acid serine (S). This residue is a part of beta 112 
strand of the beta-barrel forming the C-terminal domain. The 
codon 141 (GAA) is a part of the C-terminal end for which three- 
dimensional structure is not yet solved [71]. 

In case of 2C gene, three codons viz. 72, 88 and 189 were 
identified to be under significant episodic positive selection. In 
Picornaviruses, 2C protein is one of the highly conserved proteins 
and is multifunctional. It is known to possess NTPase and ATPase 
activities. The NTPase domain contains three conserved motifs (A, 
B and C) which are characteristic of superfamily III of the 
NTPase /helicase proteins [72]. The codon 189 encodes amino 
acid residue which is adjacent to the known B-motif (formed by 
residues at positions 176-187) within NTPase domain. However, 
its structural mapping could not be analyzed due to the 
unavailability of its crystal structure. 

In 3D polymerase gene, two codons were identified to be under 
significant episodic positive selection viz. 258 and 347. Both of these 
codons encode amino acid residues which occur within functionally 
important domains. The codon 258 encodes for serine (S), which 
belongs to the inner fingers of the fingers domain and is observed to 
be within the loop region joining the two helices [73]. The codon 
347 encode for lysine (K), which belongs to the motif D, which is one 
of the highly conserved known motifs for oligonucleotide polymer- 
ases. The motif D (residues 338-355) is present within a helix (otl6) 
in the palm domain (residues 291-373) [73]. 

Episodic Diversifying Selection in Newly Emerging 
Lineages 

In order to check the evidence of episodic positive selection 
leading to the diversification of HRV-A, -B and -C into distinct 
lineages (as obtained by the STRUCTURE program), the Branch- 
site REL (BSR) analysis was performed. The three datasets 
consisting of complete polyprotein coding sequences of HRV-A, - 
B and -C were analyzed separately. This method identifies the 
nodes/branches in the trees that are under episodic diversifying 
selection pressure. For Rhinovirus A, BSR analysis provided 
significant evidence (p<0.05) of episodic diversifying selection on 
16 nodes/branches of the tree. The BSR method strongly 
indicates episodic diversifying selection acting on the node 7 (p< 
0.0001) which leads to the lineages Al, A2 and A3. Among all the 
16 branches, the branches under the node 7 showed the higher 
strength of episodic diversifying selection (Figure S4). The tree in 
the Figure S4 was generated by excluding the serotypes HRV-A53 
and HRV-A45, which represents the subpopulations A2 and A3 
respectively. Inclusion of these serotypes in this analysis also 
supports the evidence of episodic positive selection at node 7 but 
the proportion of sites under selection in these serotypes are so 
high that it affects the representation of the strength of episodic 
diversifying selection for other serotypes, as evident from resultant 
tree topologies (tree not shown). 

In case of HRV-B, BSR method strongly supports episodic 
diversifying selection acting on the node 4 (p<0.0001) which leads 
to the lineage comprising of the drug-resistant serotypes (see 



Figure S5). In case of HRV-C, no significant episodic diversifying 
selection was observed using the BSR method. 

Discussion 

A high mutation rate and intra-species recombination are the 
eminent evolutionary forces introducing genetic variability within 
various species belonging to the genus Enterovirus [74] to which 
Rhinoviruses belong. In general, RNA viruses are excellent systems 
to study evolution under the theoretical framework of population 
genetics because of comparatively higher mutation rate, short 
generation time, large population sizes and relatively smaller size 
of genomes [75]. 

Evidence of Population Structure and Role of 
Recombination in the Emergence of New Lineages in 
Rhinoviruses 

A number of attempts have been made to analyze genetic 
diversity of Rhinoviruses and emerging taxonomic lineages have 
been documented [12,22,60,76]. These studies were carried out 
using either subgenomic region(s) or limited data on complete 
genome sequences, especially for newly identified HRV-C strains. 
A comprehensive complete genome-based analysis was carried out 
to analyze population diversity of Rhinoviruses and to understand 
the evolution of diversifying lineages in HRV-A, -B and -C species. 

Sublevel structure analysis of HRV-A population (at K= 2) 
clearly distinguished members of subpopulation A from that of three 
additional subpopulations viz., Al, A2 and A3. This indicates that 
subpopulations Al, A2 and A3 initially shared common allele 
frequency at specific loci and diverged later. Population stratifica- 
tion analyses suggest that Al members showed the clearest signal of 
divergence (at K= 4) followed by divergence of A2 (at K= 5) and A3 
members (at K= 7) at subsequent values of K. The members of A3 
continue to cluster with A2 members (till K= 6) and thereby suggest 
that A3 shared common allele frequency with A2 and diverged 
subsequently as evident by the population diversification analysis at 
K„ pt = 7. The cluster A3 consisting of HRV-A8, -A45 and -A95 has 
been identified as a distinct clade. These strains were proposed 
earlier as potentially the fourth species of Rhinoviruses (HRV-D) 
based on the complete genomic analyses [22]. However, separation 
of Al and A2 subclusters prior to separation of A3 strongly 
emphasizes existence and consideration of Al, A2 and A3 
subpopulations as independently evolving lineages of HRV-A. 
Phylogenetic studies that were based on sequences of VP1 and 3D- 
pol proteins also support this subdivision of members of HRV-A 
into four genetically distinct lineages [23]. 

Population genetic structure analysis of Rhinoviruses reported 
here provides strong evidence for existence of seven genetically 
distinct subpopulations using admixture as well as linkage model. 
Admixture model tend to ignore physical linkage between loci and 
at times may under- or over-estimate the proportion of admixed 
individuals [26,33]. Hence, linkage model, which accounts for 
potential linkage, is applied to refine membership scores of 
admixed individuals [33]. In case of HRV-A population, intra- 
species recombination was observed especially in A, A2 and A3 
subpopulations. In these subpopulations, the assignments of almost 
all the admixed strains to belong to the respective subpopulations 
were observed to remain unchanged using both, admixture and 
linkage models. This observation supports evidence of intra-species 
recombination in these strains irrespective of any potential 
occurrence of the physical linkage between loci. The low values 
of linkage measures ( | D' | , r 2 and f^j also substantiate only weak 
evidence of linkage and supports for recombination in origin and 
evolution of Rhinovirus strains. 
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Recombination analyses (using both STRUCTURE and RDP4 
programs) suggest that majority of the A2 members are recombi- 
nants. HRV-A101 is derived through recombination between A2 
members HRV-A65 and -A78 and has been proposed as a new 
lineage within HRV-A [62] . Another A2 member, HRV-A28 is also 
reported to be a recombinant of A2 members such as HRV-A68 
and -A20. Similarly, HRV-A46 is also a recombinant of HRV-A53 
and -A80 (all three belong to the A2 subpopulation) [22] . Presence 
of highest number of recombinants, the emergence of newly 
proposed A3 members from A2 and subdivision of A2 subpopu- 
lation into two distinct subclusters observed using population 
stratification analysis supports the hypothesis that the members of 
the A2 subcluster are more likely to undergo frequent recombina- 
tion events leading to evolution of new strains/lineages of HRV-A. 

In case of HRV-C population, linkage model was found to be 
advantageous over the admixture model in detecting the inter- 
species admixture events in several HRV-C strains belonging to the 
C2 subpopulation. The complete genome based population 
structure analysis along with the results obtained using the RDP4 
program provide strong evidence of emergence of C2 subpopula- 
tion as a distinct lineage due to inter-species recombination between 
the HRV-A and HRV-C, mainly in the 5'UTR region. It 
corroborates well with the earlier observation that the inter-species 
recombination between the strains of HRV-A and HRV-C exists 
[60]. Furthermore, 5 '-UTR-polyprotein junction has been exper- 
imentally found to have potential to undergo recombination and is 
reported to be involved in the inter-species recombination among as 
well as between the members of Rhinoviruses and Human enterovirus 
(HEVs) species [77]. A recent study reports presence of three 
species-like clusters (Cot, C p and Cy) within HRV-C based on 
analysis of the dataset consisting of six proteins which are conserved 
across family Picomaviridae [76]. There are four strains of proposed 
Ca cluster viz., C026, NY-074, NAT001 and QPM, representing 
the types HRV-C6, -C7, -CI and -C3 respectively. Our analysis 
demonstrates that these strains belong to the C2 subpopulation. The 
strains that are proposed to form C(5 and Cy sub-species are 
grouped under CI subpopulation in our analysis. Thus, the 
population stratification studies based on complete genomes 
indicate that there are only two subpopulations of HRV-C which 
further could be subdivided into nine sublevel subpopulations. 
Additional genomic data and analysis is however required to 
conclusively arrive at subspecies organization of the HRV-C. 

Role of Episodic Positive Selection in Rhinovirus Evolution 
and Genotype-phenotype Correlations 

Evidence of genetic structure and diversifying lineages in 
Rhinoviruses led us to undertake a study to understand role of 
lineage specific episodic positive selection pressure, if any, using 
MEME method [52]. MEME method has also been used to study 
the emergence of genetically divergent lineages in Simian 
Immunodeficiency Virus (SIV) and Human Immunodeficiency Virus 
(HIV), Human Respiratory Syncytial Virus (RSV) and Foot and-mouth 
disease virus (FMDV) [78-80]. 

Substantial evidence for episodic diversifying selection was 
obtained and it appears to be responsible for diversification of 
HRV-A population into three additional lineages (Al, A2 and A3). 
Episodic positive selection is found to have operated both on regions 
coding for structural and non-structural proteins. In case of 
structural proteins, most of the sites under episodic diversifying 
selection were found to be associated with phenotypic properties 
such as antigenicity and/ or receptor-binding specificity in HRV-A 
and -B. Most of the antigenic sites that were mapped happened to 
be characteristics of minor receptor group viruses. These results also 
corroborate with the previously reported observation that minor 



receptor group viruses undergo positive destabilizing selection while 
major group viruses undergo positive stabilizing selection [81]. 

A recent study based on modeling of capsid of type HRV-C 15 
reports major structural alterations in the loop regions of VP1 
[82]. In HRV-C, the length of VP1 protein is shorter by 21 and 35 
amino acid residues as compared to HRV-A and -B respectively. 
These deletions lead to structural alterations by means of loss of 5- 
fold plateau in HRV-C virus particle, as against HRV-A and -B 
[82]. As a result, neutralizing immunization (Nim) sites namely 
Nim-IA (residues 91 and 95), Nim- IB (residues 82 and 85) are 
subjected to increased selection pressure in HRV-C. 

Different receptor usage by various genera of Picomaviridae as 
well as other virus families has been proposed to be a useful 
mechanism to escape antibody neutralization. Coevolution of 
receptor usage and antigenicity has also been observed in various 
picornaviruses such as poliovirus, FMDV as well as other RNA 
viruses [83]. Consequently, the viral capsid region(s) in general 
and the receptor attachment sites in particular are reported to be 
subjected to strong selection pressure [83,84], implying possible 
involvement of independent receptor(s) by HRV-C. It is interest- 
ing to note that two independent approaches involving selection 
pressure analysis reported in this paper and mapping of epitope 
data on three-dimensional structure [82] help to correlate 
genotype with phenotype. Analysis of population diversity along- 
with genotype-phenotype correlation studies is expected to play 
increasingly important role not only in designing new vaccines and 
drugs but also to in explaining emergence of drug resistance 
amongst viral subpopulations. 

Conclusions 

The population stratification studies based on complete 
genomes using the STRUCTURE program clearly demonstrates 
diversification of Rhinovirus population into seven distinct lineages. 
This study establishes that HRV-A and -C have further diversified 
into four and two distinct subpopulations, respectively. Besides role 
of intra-species recombination, an evidence of episodic positive 
selection in evolution of newly emerging lineages within HRV-A is 
also reported. Evolution of HRV-C seems to be driven by intra- 
species as well as inter-species recombination with HRV-A. This 
study adds new understanding of origin and evolution of emerging 
lineages within Rhinovirus population. It also furthers our knowl- 
edge about Rhinovirus taxonomic diversity and may lead to bring 
out their subspecies organization more clearly. 

Supporting Information 

Figure SI Population structure of Rhinoviruses ob- 
tained by Bayesian-based approach, using linkage 
model at K = 7. HRV-A comprises of four subpopulations, 
namely, A (yellow), A2 (orange), A3 (green). HRV-B members 
form a single cluster (blue) with no further subdivision. HRV-C 
comprises of two subpopulations, namely, C 1 (magenta) and C2 
(Cyan). The Al, A2, A3, CI and C2 subpopulations show the 
admixed strains. They are color-coded based on the proportion of 
membership scores with respective subpopulations. 
(TIF) 

Figure S2 Phylogenetic tree of Rhinoviruses obtained 
using Maximum likelihood method in MEGA 5.05. 

Complete genome sequence data with 1000 bootstrap replicates 
was used. The operational taxonomic unit (OTU) label consists of 
two parts divided by pipe (' | ') character. The first part (before ' | ') 
indicates species-serotype and second part constitute GenBank 
accession number of the associated entry. The branches in the tree 
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are color coded as per the seven subpopulations obtained using 
STRUCTURE program [Subpopulation A: blue, Al: yellow, A2: 
red, A3: green, B: magenta, CI: orange, C2: cyan]. 
(TIF) 

Figure S3 Phylogenetic tree of Rhinoviruses obtained 
using Maximum parsimony method in MEGA 5.05. 

Complete genome sequence data with 1000 bootstrap replicates 
was used. The operational taxonomic unit (OTU) label consists of 
two parts divided by pipe (' | ') character. The first part (before ' | ') 
indicates species-serotype and second part constitute GenBank 
accession number of the associated entry. The branches in the tree 
are color coded as per the seven subpopulations obtained using 
STRUCTURE program [Subpopulation A: blue, Al: yellow, A2: 
red, A3: green, B: magenta, CI: orange, C2: cyan]. Note: For the 
ease of readability of bootstrap and OTU labels, tree is shown in 
rectangular representation. 
(TIF) 

Figure S4 Evidence of episodic diversifying selection in 
Rhinovirus A obtained using Branch-site REL method. 

(PDF) 

Figure S5 Evidence of episodic diversifying selection in 
Rhinovirus B obtained using Branch-site REL method. 

(PDF) 

Table SI Complete genome sequence dataset of 179 
Rhinoviruses with GenBank accession numbers used in 
the study. 

(DOC) 
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